GeoThermalCloud: Geothermal Machine Learning Analysis: Utah

This notebook is a part of the GeoThermalCloud.jl: Machine Learning Framework for Geothermal Exploration.

geothermalcloud

GeoThermalCloud installation

If GeoThermalCloud is not installed, first execute in the Julia REPL:

import Pkg
Pkg.add("GeoThermalCloud")
Pkg.add("NMFk")
Pkg.add("Mads")
Pkg.add("DelimitedFiles")
Pkg.add("JLD")
Pkg.add("Gadfly")
Pkg.add("Cairo")
Pkg.add("Fontconfig")
Pkg.add("Kriging")
Pkg.add("GMT")
Pkg.add("Images")

Load data

Set coordinates:

Data Locations

Plot data locations on a map:

Define data attributes

We can use attribute names from the header in the input file.

However, the names are short.

For better understanding the variable names in the plots generated bellow, we are defining short and long attribute names:

Pre-processing

Set empty data entries to NaN:

Convert to Float32:

Rescale δO18 data (‰):

Set variables for the number of attributes and points:

Plot histograms and compute data statistics:

Note that the data entries for TDS, Al, and δO18 are heavily missing.

Even though the dataset is very sparse, our ML methods can analyze the inputs.

Most of the commonly used ML methods cannot process datasets that are sparse.

Furthermore, different attributes in the Great Basin dataset cover different areas.

This is demonstrated in the maps generated below.

Log-transformation

Attribute values are log-transformed to better capture the order of magnitude variability.

All attributes except for Quartz, Chalcedony and pH are log-transformed (Quartz and Chalcedony have negative values).

Normalize the data

Define the number of signatures to be explored

Define a directory where outputs should be stored

Define the number of NMF iterations (NMF random initial guess runs):

Run NMFk on normalized data

Get the acceptable solutions within the present range of number of signatures:

Get the optimal number of signatures:

Plot the fit and robustness of the solution

Analysis of the optimal solution

Associations of the data attributes with the extracted signatures

Associations of the data locations with the extracted signatures